An Empirical Study of Memory Hardware Errors in A Server Farm
نویسندگان
چکیده
The integrity of system hardware is an important requirement for providing dependable services. Understanding the hardware’s failure mechanisms and the error rate is therefore an important step towards devising an effective overall protection mechanism to prevent service failure. In this paper we discuss an on-going case study of memory hardware failures of production systems in a server-farm environment. We present some preliminary results collected from 212 machines. Our observations under a normal, non-accelerated condition validate the existence of all failure modes modeled in the previous literature: single-cell, row, column, and whole-chip failures. We also provide a quantitative analysis of the error rates.
منابع مشابه
mcelog: memory error handling in user space
Servers and high-performance computing systems contain more and more memory to handle bigger data sets. But with more and larger memory modules, and more transistors in them, combined with larger clusters of systems, the rate of memory errors in operation is also increasing. Modern server systems generally use ECC memory and other ways to detect and correct many memory errors in the hardware. W...
متن کاملMeasuring the Impact of Memory Errors on Application Performance
Memory reliability is a key factor in the design of warehousescale computers. Prior work has focused on the performance overheads of memory fault-tolerance schemes when errors do not occur at all, and when detected but uncorrectable errors occur, which result in machine downtime and loss of availability. We focus on a common third scenario, namely, situations when hard but correctable faults ex...
متن کاملHeterogeneous-Reliability Memory: Exploiting Application-Level Memory Error Tolerance
Recent studies estimate that server cost contributes to as much as 57% of the total cost of ownership (TCO) of a datacenter [1]. One key contributor to this high server cost is the procurement of memory devices such as DRAMs, especially for data-intensive datacenter cloud applications that need low latency (such as web search, in-memory caching, and graph traversal). Such memory devices, howeve...
متن کاملAn Empirical Analysis of Vertical Integration Determinants among Peasant Farmers in Northern Algeria
This study aims to analyze the determinants of vertical integration (ownership and contract-ing) among peasant farmers in Northern Algeria. The choice of asset control is between ownership and a simple contracting. Thus, the integration of vertical stages of agricultural produc-tion leads to higher gross margins, influences the choice of marketing and supply channels, and improves market partic...
متن کاملPersistent Residual Increase in Server Processing Time
In this case study we present our observations of a query processing engine running at a server farm operated by one of our industrial partners. We examine the query engine response time (termed MSBFS) under a variety of conditions. Observations show that there is a persistent residual increase in the server processing time that is only reset with rebooting the hardware.
متن کامل